Novel hybrid DNN approaches for speaker verification in emotional and stressful talking environments
نویسندگان
چکیده
In this work, we conducted an empirical comparative study of the performance text-independent speaker verification in emotional and stressful environments. This work combined deep models with shallow architecture, which resulted novel hybrid classifiers. Four distinct were utilized: neural network-hidden Markov model (DNN-HMM), network-Gaussian mixture (DNN-GMM), Gaussian model-deep network (GMM-DNN), hidden (HMM-DNN). All based on implemented architecture. The used three speech datasets: a private Arabic dataset two public English databases, namely Speech Under Simulated Actual Stress (SUSAS) Ryerson Audio-Visual Database Emotional Song (RAVDESS). test results aforementioned demonstrated that proposed HMM-DNN leveraged Results also showed outperformed all other terms equal error rate (EER) area under curve (AUC) evaluation metrics. average resulting system datasets yielded EERs 7.19, 16.85, 11.51, 11.90% HMM-DNN, DNN-HMM, DNN-GMM, GMM-DNN, respectively. Furthermore, found DNN-GMM least computational complexity compared to both talking Conversely, required greatest amount training time. Findings EER AUC values depended database when comparing performances.
منابع مشابه
Talking condition recognition in stressful and emotional talking environments based on CSPHMM2s
This work is aimed at exploiting Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) as classifiers to enhance talking condition recognition in stressful and emotional talking environments (completely two separate environments). The stressful talking environment that has been used in this work uses Speech Under Simulated and Actual Stress (SUSAS) database, while the emotional t...
متن کاملSpeaker identification investigation and analysis in unbiased and biased emotional talking environments
This work aims at investigating and analyzing speaker identification in each unbiased and biased emotional talking environments based on a classifier called Suprasegmental Hidden Markov Models (SPHMMs). The first talking environment is unbiased towards any emotion, while the second talking environment is biased towards different emotions. Each of these talking environments is made up of six dis...
متن کاملEmploying both gender and emotion cues to enhance speaker identification performance in emotional talking environments
Speaker recognition performance in emotional talking environments is not as high as it is in neutral talking environments. This work focuses on proposing, implementing, and evaluating a new approach to enhance the performance in emotional talking environments. The new proposed approach is based on identifying the unknown speaker using both his/her gender and emotion cues. Both Hidden Markov Mod...
متن کاملSpeaker Identification in Emotional Environments
The performance of speaker identification is almost perfect in the neutral environment. However, the performance is significantly deteriorated in emotional environments. In this work, three different and separate models have been used, tested and compared to identify speakers in each of the neutral and emotional environments (completely two separate environments). Our emotional environments in ...
متن کاملDNN-Based Speaker Clustering for Speaker Diarisation
Speaker diarisation, the task of answering “who spoke when?”, is often considered to consist of three independent stages: speech activity detection, speaker segmentation and speaker clustering. These represent the separation of speech and nonspeech, the splitting into speaker homogeneous speech segments, followed by grouping together those which belong to the same speaker. This paper is concern...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Neural Computing and Applications
سال: 2021
ISSN: ['0941-0643', '1433-3058']
DOI: https://doi.org/10.1007/s00521-021-06226-w